Conversation
There was a problem hiding this comment.
Pull request overview
Adds lightweight in-process tracking of recent lookup durations to help estimate service strain, exposing this data via /status to support the logging/monitoring goals described in #228.
Changes:
- Track recent lookup timings in a bounded in-memory deque.
- Extend
/statusresponse to include recent query timing statistics and samples. - Refactor lookup logging to reuse computed timing values.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| time_end = time.time_ns() | ||
| time_taken_ms = (time_end - time_start)/1_000_000 | ||
| time_taken_ms_solr = (time_solr_end - time_solr_start)/1_000_000 | ||
| recent_query_times.append(time_taken_ms) |
There was a problem hiding this comment.
The comment and variable name suggest you're tracking “Solr query time”, but the value appended is time_taken_ms (overall request time including non-Solr processing). This makes /status's recent_queries.mean_time_ms ambiguous/misleading. Either append time_taken_ms_solr (if you want Solr time) or rename the variables/keys to reflect total request timing (or expose both metrics separately).
| recent_query_times.append(time_taken_ms) | |
| recent_query_times.append(time_taken_ms_solr) |
- Track Solr-only wait time in a separate deque so mean_solr_time_ms can be distinguished from total API time in /status - Pull query handler (requests/errors/timeouts/p75/p95/p99), cache (hitratio/evictions), and JVM (heap %, CPU load) from Solr's /admin/metrics API; fails gracefully with solr_metrics: null - Fix RECENT_TIMES_COUNT env var type cast (int) to prevent deque crash - Replace -1 sentinel with None for empty mean; remove verbose recent_times_ms list from response Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Combine two /admin/metrics calls into one (group=core&group=jvm), halving the round-trip overhead per /status request - Pin core key to name_lookup_shard1_replica_n1 instead of non- deterministic next() iteration over the metrics dict - Move raise_for_status() inside the async with block so all Solr I/O is co-located - Add test_status_shape and test_status_recent_queries_populated tests - Update API.md with recent_queries and solr_metrics response fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atus The Solr metrics round-trip is skipped unless the caller explicitly passes ?metrics=true. The solr_metrics key is omitted from the response entirely when not requested. Tests and API.md updated accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
As per #228, this PR tracks recent query time taken as a way of estimating how much strain this system is under. The correct way of doing this is probably via Prometheus (see https://chatgpt.com/s/t_6944784e41e88191a0608351b0809619), but we might be able to make something hacky work using this information. Maybe.
WIP: